Brian (Bo) Li

Ph.D Student, Computer Science
Nanyang Technological University, Singapore

About Me

I am a third-year PhD student and luckily advised by Prof. Ziwei Liu. My research focuses on multimodal models and building true intelligence.

I am lucky to work with many brilliant researchers in a non-profit research-oriented organization, LMMs-Lab, we share the sincere passion for developing multimodal intelligence.

Email: drluodian[at]gmail[dot]com

Selected Publications

[18]

Bo Li*, Change Loy Chen, Fanyi Pu, Jingkang Yang, Kaichen Zhang*, Kairui Hu, Luu Minh Thang*, Nguyen Quang Trung*, Pham Ba Cong*, Shuai Liu, Yezhen Wang*, Ziwei Liu Aero-1-Audio Technical Blog,
Open models for wide range of audio tasks, trained on only 50K hours data yet achieving excellent performance, suggesting smart data > massive training; development lead.
[17]

Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li LLaVA-OneVision: Easy Visual Task Transfer Transactions on Machine Learning Research (TMLR), 2025,
SOTA-level fully open models (models/data/code) achieving GPT-4o-level performance across 30+ image and video tasks. Led codebase, datacuration, and evaluation.
[16]

Kaichen Zhang*, Bo Li*, Peiyuan Zhang*, Fanyi Pu*, Joshua Adrian Cahyono*, Kairui Hu*, Shuai Liu*, Yuanhan Zhang*, Jingkang Yang*, Chunyuan Li*, Ziwei Liu* LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models North American Chapter of the Association for Computational Linguistics (NAACL), 2025,
Open-source evaluation frameworks spanning text, image, video, and audio tasks with 2.4K GitHub stars; contributed core framework and major code.
[12]

Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, Yong Jae Lee LLaVA-NeXT: Improved reasoning, OCR, and world knowledge Technical Blog,
First open models achieving GPT-4V-level performance, trained for 24 hours on 32 A100 GPUs. Proposed the idea that massive evaluation leads to better models. [code]
[11]

Bo Li*, Yuanhan Zhang*, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu MIMIC-IT: Multi-modal In-Context Instruction Tuning IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025,
Early (2023-10) experiment on a vision-language-agent (VLA) model with RLHF; proposed the idea and drafted the training code. [code]
[10]

Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu Benchmarking and Analyzing Generative Data for Visual Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025,
Early (2022-12) experiment using synthetic data for visual recognition. [code]
[8]

Bo Li*, Yifei Shen*, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu Sparse Mixture-of-Experts are Domain Generalizable Learners ICLR 2023 (Oral), In International Conference on Representation Learning 2023,
First batch (2022-05) theoretical analysis of the mixture-of-experts architecture from a generalization perspective. [code] Short version in NeurIPS 2022 Workshop on Distribution Shift.
[9]

Liangyu Chen*, Bo Li*, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, Ziwei Liu Coordinating Multiple Vision-Language Models for Visual Reasoning NeurIPS 2023, In Conference on Neural Information Processing Systems. Short version in ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo).
[6]

Bo Li, Yifei Shen, Yezhen Wang, Wenzhen Zhu, Dongsheng Li, Kurt Keutzer, Han Zhao Invariant information bottleneck for domain generalization AAAI 2022, In Proceedings of the AAAI Conference on Artificial Intelligence. [code]
[5]

Yezhen Wang, Bo Li, Tong Che, Kaiyang Zhou, Ziwei Liu, Dongsheng Li Energy-Based Open-World Uncertainty Modeling for Confidence Calibration ICCV 2021, In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). [code]
[4]

Bo Li, Yezhen Wang, Shanghang Zhang, Dongsheng Li, Kurt Keutzer, Trevor Darrell, Han Zhao Learning invariant representations and risks for semi-supervised domain adaptation CVPR 2021, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [code]
[3]

Sicheng Zhao, Bo Li, Pengfei Xu, Xiangyu Yue, Guiguang Ding, Kurt Keutzer MADAN: multi-source adversarial domain aggregation network for domain adaptation IJCV 2021, International Journal of Computer Vision.
[2]

Bo Li, Yezhen Wang, Tong Che, Shanghang Zhang, Yoshua Bengio, Kurt Keutzer Rethinking distributional matching based domain adaptation arXiv preprint arXiv:2006.13352.
[1]

Sicheng Zhao*, Bo Li*, Xiangyu Yue*, Yang Gu, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer Multi-source domain adaptation for semantic segmentation NeurIPS 2019, In Neural Information Processing Systems. [code]

Experiences

I have been fortunately collaborating and doing research at/with

Dec. 2023 - Aug. 2024: Bytedance Seed Research, Singapore

Supervised by Dr. Chunyuan Li. Dedicated in building open-source multimodal models.
Dec. 2022 - Aug. 2023: Microsoft Research, Redmond

Supervised by Dr. Chunyuan Li. Great experience on multimodal learning, learned a lot from great teammate Haotian Liu and later participated in the LLaVA Journey.
Sep. 2020 - Dec. 2021: Microsoft Research, Shanghai

Supervised by Dr. Dongsheng Li in the beautiful and relaxing WestBud office, with chill and smart colleagues.
Oct. 2019 - Aug. 2020 (remote till May 2021): Berkeley AI Research, CA, USA

Supervised by Prof. Kurt Keutzer and Prof. Sicheng Zhao, Prof. Xiangyu Yue, Prof. Shanghang Zhang and Dr. Colorado Reed. Enjoy the weather and front-tier research atmosphere. Go Cal and Roll on your Golden Bears!
Jan 2020 - Nov 2022: Dr. Tong Che, MILA/Nvidia Research

Great appreciation on guiding me to explore many fascinating ML topics.
May 2020 - Dec. 2021: Prof. Han Zhao, UIUC

Learn to write a paper with machine learning taste.
May 2018 - Oct. 2019: DiDi Visual Perception Team, Beijing

First internship and two papers there.

Professional Services

Talk/Technical Sharing:
- Multimodal Models, LMMs-Lab Projects@Jump Trading (2025), Hosted by Weifeng Liu.
- Guest Lecture: Multimodal Models@UMich, UM EECS 542: Advanced Topics in Computer Vision, Hosted by Stella X. Yu
- Multimodal Models, LMMs-Lab Projects@TwelveLabs (2024), Hosted by James Le
- Multimodal Models, LMMs-Lab Projects@Tiktok (2024)
- Otter & MIMICIT@Alibaba, Damo Academy, Hosted by Dr. Lidong Bing, Sep. 2023.
- Otter & MIMICIT@HITSZ, Hosted by Prof. Rui Shao, Jul. 2023.

Slab@NTU: Cluster Adminstrator (70+ users, 400+ GPUs)

The AI Talk: Organizer

Conference Reviewer / Program Committee:
- ICCV (2021,2023), NeurIPS (2022), BMVC (2023), AAAI (2023), CVPR (2022,2023), AISTATS (2023), ICML (2023).
- Workshop: ICLR 2023 (DG)

Journal Reviewer:
- Pattern Recognition (PR)
- Transactions on Multimedia (TMM)
- Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
- International Journal of Computer Vision (IJCV)

Acknowledgements: this website builds on al-folio and Jiaming Song.